Operation management device, operation management method

ABSTRACT

An operation management to grasp a metric in which a continuous abnormality has occurred in a system, easily, is provided. An operation management apparatus  100  includes a metric collection unit ( 101 ) and an abnormality score calculation unit ( 104 ). The metric collection unit ( 101 ) collects a measured value of each of a plurality of metrics in a system sequentially. The abnormality score calculation unit ( 104 ) calculates and outputs, on the basis of a continuity level indicating a degree of continuity of an abnormality of the measurement value for each of the plurality of metrics at each time, an abnormality score for the metric.

TECHNICAL FIELD

The invention relates to an operation management apparatus, an operationmanagement method, and a program, in particular an operation managementapparatus, an operation management method, and a program which detect anabnormality of a system.

BACKGROUND ART

In an IT (Information Technology) system, a manager monitors the system,and avoids a fatal situation such as a system shutdown, by prioritizingdetected abnormalities, when occurrence of a system abnormality isrecognized.

Patent literatures 1 and 2 describe examples of operation managementsystems detecting the abnormality of the IT system. The operationmanagement systems described in patent literatures 1 and 2 detect acorrelation with respect to each of combinations of metrics, on thebasis of a measurement value of a plurality of the metrics (performanceindexes) of the system, and generate a correlation model. The operationmanagement systems determine whether correlation destruction hasoccurred or not with respect to an inputted measurement value of themetrics using the generated correlation model to detect a systemabnormality.

In the operation management system, a graph showing the number ofcorrelation destructions with time, which is used by a manager todetermine whether a system abnormality has occurred, is outputted. Alist of the metrics in which the abnormality is detected (abnormalmetrics) is also outputted with an abnormality score, as details of theabnormality at a given time.

As a related technology, patent literature 3 discloses a monitoringdevice which detects a monitoring item and a threshold value fromresource items of a system by using a statistics technique.

CITATION LIST Patent Literature

[Patent literature 1] Japanese Patent Application Laid-Open 2009-199533

[Patent literature 2] Japanese Patent Application Laid-Open 2010-186310

[Patent literature 2] Japanese Patent Application Laid-Open 2003-263342

SUMMARY OF INVENTION Technical Problem

In the operation management system described in patent literatures 1 and2, as a system size increases, the number of metrics increases and alarge amount of abnormal metrics are presented. In the abnormal metrics,a metric, in which an abnormality of low importance level (emergencylevel) has occurred, is included. That is, a metric, in which anabnormality that a manager does not need to take notice such as anabnormality which disappears in a short time due to backgroundprocessing or the like has occurred, is included. Therefore, when alarge number of abnormal metrics have occurred, it becomes difficult tograsp a metric in which an abnormality of high importance level(emergency level) such as a continuous abnormality for a long time hasoccurred.

An object of the invention is to solve the above mentioned problem andto provide an operation management apparatus, an operation managementmethod, and a program to grasp a metric in which a continuousabnormality has occurred in a system, easily.

Solution to Problem

An operation management apparatus according to an exemplary aspect ofthe invention includes: a metric collection means for collecting ameasured value of each of a plurality of metrics in a systemsequentially; and an abnormality score calculation means for calculatingand outputting, on the basis of a continuity level indicating a degreeof continuity of an abnormality of the measurement value for each of theplurality of metrics at each time, an abnormality score for the metric.

An operation management method according to an exemplary aspect of theinvention includes: collecting a measured value of each of a pluralityof metrics in a system sequentially; and calculating and outputting, onthe basis of a continuity level indicating a degree of continuity of anabnormality of the measurement value for each of the plurality ofmetrics at each time, an abnormality score for the metric.

A computer readable storage medium according to an exemplary aspect ofthe invention, records thereon a program, causing a computer to performa method including: collecting a measured value of each of a pluralityof metrics in a system sequentially; and calculating and outputting, onthe basis of a continuity level indicating a degree of continuity of anabnormality of the measurement value for each of the plurality ofmetrics at each time, an abnormality score for the metric.

Advantageous Effect of Invention

The advantageous effect of the invention is to grasp a continuousabnormality of a metric in a system, easily.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram showing a characteristic configurationaccording to a first exemplary embodiment of the present invention.

FIG. 2 is a block diagram showing a configuration of an operationmanagement system to which an operation management apparatus 100 of thefirst exemplary embodiment of the present invention is applied.

FIG. 3 is a flowchart showing a processing of the operation managementapparatus 100 in the first exemplary embodiment of the presentinvention.

FIG. 4 is a diagram showing an example of sequential performanceinformation 121 in the first exemplary embodiment of the presentinvention.

FIG. 5 is a diagram showing an example of a correlation model 122 in thefirst exemplary embodiment of the present invention.

FIG. 6 is a diagram showing an example of a residual error in the firstexemplary embodiment of the present invention.

FIG. 7 is a diagram showing an example of correlation change information123 in the first exemplary embodiment of the present invention.

FIG. 8 is a diagram showing a calculation process of an abnormalityscore in the first exemplary embodiment of the present invention.

FIG. 9 is a diagram showing a calculation result of the abnormalityscore in the first exemplary embodiment of the present invention.

FIG. 10 is a diagram showing an example of an analysis result 130 in thefirst exemplary embodiment of the present invention.

FIG. 11 is a flowchart showing processing of an operation managementapparatus 100 in a second exemplary embodiment of the present invention.

FIG. 12 is a diagram showing a calculation result of a group abnormalityscore in the second exemplary embodiment of the present invention.

FIG. 13 is a diagram showing an example of an analysis result 140 in thesecond exemplary embodiment of the present invention.

DESCRIPTION OF EMBODIMENTS First Exemplary Embodiment

Next a first exemplary embodiment of the present invention will bedescribed.

Initially, a configuration of the first exemplary embodiment of thepresent invention will be described. FIG. 2 is a block diagram showing aconfiguration of an operation management system to which an operationmanagement apparatus 100 of the first exemplary embodiment of thepresent invention is applied.

Referring to FIG. 2, the operation management system of the firstexemplary embodiment of the present invention includes an operationmanagement apparatus 100, one or more monitored apparatuses 200, and amonitoring terminal 300. The operation management apparatus 100 and themonitored apparatus 200 are connected by a network. The operationmanagement apparatus 100 and the monitoring terminal 300 are alsoconnected by a network.

The monitored apparatus 200 is an apparatus, such as a Web server or aDatabase server, which composes a system. Each monitored apparatus 200includes a monitoring agent 201.

The monitoring agent 201 of the monitored apparatus 200 measures actualmeasurement data (measurement values) of performance values of aplurality of items in the monitored apparatus 200 at a regular timeinterval, and sends the measurement data to the operation managementapparatus 100. As an item of the performance value, a computer resourceusage rate or a computer resource amount, such as a CPU (CentralProcessing Unit) usage rate, a memory usage rate, or a disc accessfrequency is used.

Here, a set of the monitored apparatus 200 and the item of a performancevalue is defined as a metric (performance index), and a set of values ofa plurality of metrics which are measured at the same time is defined asperformance information. The metric is represented by using a numericalvalue of an integer or a decimal number. The metric corresponds to theelement in patent literature 1.

The operation management apparatus 100 generates a correlation model 122for the monitored apparatus 200 on the basis of the performanceinformation collected from the monitored apparatus 200 which is amonitoring target, and detects a failure or an abnormality of themonitored apparatus 200 using the generated correlation model 122.

The operation management apparatus 100 includes a metric collection unit101, a correlation model generation unit 102, a correlation changeanalysis unit 103, an abnormality score calculation unit 104, a metricstorage unit 111, a correlation model storage unit 112, and acorrelation change storage unit 113.

The metric collection unit 101 collects the performance information fromthe monitored apparatus 200, and stores time-series change thereof, as asequential performance information 121, in the metric storage unit 111.

The correlation model generation unit 102 generates the correlationmodel 122 of the system composed of the monitored apparatuses 200 on thebasis of the sequential performance information 121.

The correlation model storage unit 112 stores the correlation model 122generated by the correlation model generation unit 102.

The correlation change analysis unit 103 detects an abnormality of acorrelation for each combination of metrics included in the correlationmodel 122, with respect to newly inputted performance information, asdescribed in patent literature 1.

The correlation change storage unit 113 stores the detection result ofthe abnormality of a correlation by the correlation change analysis unit103, as correlation change information 123.

The abnormality score calculation unit 104 calculates an abnormalityscore of each metric on the basis of the correlation change information123 and outputs the calculated abnormality score to the monitoringterminal 300.

The monitoring terminal 300 is a terminal which is used when theoperation management apparatus 100 receives instructions for detecting afailure or an abnormality of the monitored apparatus 200 from a manageror the like, and outputs the detection result. The monitoring terminal300 includes a display unit 301.

The display unit 301 of the monitoring terminal 300, which is a displaydevice such as a display, outputs (displays) the abnormality scoreoutputted by the operation management apparatus 100 to the manager orthe like by using a display screen.

Note that the operation management apparatus 100 may be a computer whichincludes a CPU and a storage medium storing programs and operatesaccording to the control based on the programs. Moreover, the metricstorage unit 111, the correlation model storage unit 112, and thecorrelation change storage unit 113 are separated each other or areincluded in one storage medium.

Next, an operation of the operation management apparatus 100 of thefirst exemplary embodiment of the present invention will be explained.

FIG. 3 is a flowchart showing processing of the operation managementapparatus 100 of the first exemplary embodiment of the presentinvention.

Initially, the metric collection unit 101 of the operation managementapparatus 100 collects performance information measured by themonitoring agent 201 on the monitored apparatus 200, and stores thecollected performance information in the metric storage unit 111 (stepS101).

FIG. 4 is a diagram showing an example of the sequential performanceinformation 121 of the first exemplary embodiment of the presentinvention. In the example of FIG. 4, the sequential performanceinformation 121 includes a time-series change of measurement values ofmetrics x_1, x_2, x_3, . . . (hereinafter, a character following “_”indicates a suffix).

For example, the metric collection unit 101 stores the sequentialperformance information 121 shown in FIG. 4.

Next, the correlation model generation unit 102 refers to the sequentialperformance information 121 of the metric storage unit 111, generatesthe correlation model 122 on the basis of performance information in apredetermined modeling period specified by the manager or the like, andstores the generated correlation model 122 in the correlation modelstorage unit 112 (step S102).

The correlation model 122 includes a correlation function (or conversionfunction) and a threshold value for each of all combinations of twometrics in a plurality of metrics.

The correlation function is a function showing a correlation, withrespect to time-series data of measurement values in the predeterminedmodeling period (t_s≦t≦t_e, t represents time), for each combination ofmetrics by using a predetermined approximation formula. If a correlationfunction of a correlation from metric x_i to metric x_j is described asf_i,j, an estimation value of one metric x_j of the combination, isrepresented based on a measurement value of the other metric x_i, byusing the correlation function f_i, j, as Equation 1.

estimation value of x _(j)(t)=f _(i,j)(x _(i)(t,t−1,t−2, . . .))  [Equation 1]

The correlation model generation unit 102 determines a coefficient ofthe correlation function for each combination of metrics on the basis ofthe sequential performance information 121 in the predetermined modelingperiod. The coefficient of the correlation function is determinedthrough system identification processing on a time-series of themeasurement values of the metrics, as described in patent literature 1.

The threshold value is the maximum value of a residual error (conversionerror, or prediction error) due to the correlation function in thepredetermined modeling period with respect to each combination ofmetrics. The residual error is an absolute value of a difference betweenan estimation value of the metric calculated by using the correlationfunction and a measurement value of the metric.

The threshold value Th_i,j and residual error d_i,j (t) for thecorrelation from metric x_i to metric x_j is provided as Equation 2.

Th_(i,j)4=max_(t) _(_) _(s≦t≦t) _(_) _(e)(d _(i,j)(t))

d _(i,j)(t)=abs(x _(j)(t)−f _(i,j)(x _(i)(t,t−1,t−2, . . .)))  [Equation 2]

Note that abs( ) shows an absolute value of a value in the parentheses.

Here, it is assumed that, as long as the monitored apparatus 200 is innormal operation, a value of the residual error d_i,j(t) is extremelysmall and does not exceed the threshold value Th_i,j.

The correlation model generation unit 102 may calculate a weight of acorrelation function for each combination of metrics on the basis of aresidual error in the predetermined modeling period, and determine a setof the correlation functions whose weight is equal to or greater than apredetermined value and threshold values thereof, as the correlationmodel 122, as described in patent literature 1.

FIG. 5 is a diagram showing an example of the correlation model 122 ofthe first exemplary embodiment of the present invention. In the exampleof FIG. 5, the correlation model 122 includes correlation functionsamong metrics x_1, x_2, x_3, . . . , and the threshold values.

For example, the correlation model generation unit 102 generates thecorrelation model 122 shown in FIG. 5 on the basis of the sequentialperformance information 121 of FIG. 4.

Next the correlation change analysis unit 103 detects an abnormality ofa correlation included in the correlation model 122 with respect toperformance information newly collected by the metric collection unit101 at each time, and stores the correlation change information 123 inthe correlation change storage unit 113 (step S103).

Here, the correlation change analysis unit 103 determines whether anabnormality (correlation destruction) has occurred or not for eachcorrelation included in the correlation model 122, with respect to thenewly inputted performance information, as described in patentliterature 1.

An abnormality level showing a degree of the abnormality of thecorrelation is indicated by a residual error calculated by using thenewly inputted performance information and the correlation model 122.The correlation change analysis unit 103 determines whether anabnormality of a correlation from metric x_i to metric x_j on the basisof Equation 3, using the residual and the threshold value.

d _(i,j)(t)/Th_(i,j)≧1 . . . abnormal

d _(i,j)(t)/Th_(i,j)<1 . . . not abnormal  [Equation 3]

FIG. 6 is a diagram showing an example of a residual error in the firstexemplary embodiment of the present invention. In the example of FIG. 6,in case a ratio of the residual error between the estimation value andthe measurement value of metric x_j to the threshold value exceeds 1, itis determined that an abnormality has occurred in the correlation frommetric x_i to metric x_j.

FIG. 7 is a diagram showing an example of the correlation changeinformation 123 in the first exemplary embodiment of the presentinvention. The correlation change information 123 includes a ratio ofthe residual error to the threshold value (d_i,j(t)/Th_i,j) at each timeand information indicating whether an abnormality has occurred or not.

For example, the correlation model generation unit 102 detects anabnormality of correlation included in the correlation model 122 shownin FIG. 5 with respect to the newly collected performance informationand stores the correlation change information 123 shown in FIG. 7.

The abnormality score calculation unit 104 calculates an abnormalityscore of each metric on the basis of the correlation change information123 at each time (step S104).

Here, the abnormality score is calculated on the basis of theabnormality level of the correlation (residual error) related to eachmetric and a degree of continuity of the abnormality. The abnormalityscore calculation unit 104 calculates the abnormality score S_i(t) ofmetric x_i on the basis of Equation 4.

$\begin{matrix}{{{S_{i}(t)} = {{average}_{i}( {s_{i,j}(t)} )}}{{s_{i,j}(t)} = {{{d_{i,j}(t)}/{Th}_{i,j}} \times {{step}( {{d_{i,j}(t)}/{Th}_{i,j}} )} \times {c_{i,j}(t)}}}{{{step}(y)} = \{ \begin{matrix}1 & {{if}\mspace{14mu} ( {y \geq 1} )} \\0 & {else}\end{matrix} }} & \lbrack {{Equation}\mspace{14mu} 4} \rbrack\end{matrix}$

Here, average_i( ) indicates to obtain average of values in theparentheses calculated for all of the correlations between metric x_iand the other metrics with which metric x_i has a correlation. Forexample, when correlations between metric x_1 and metrics x_2, x_3, x_4exist, an average value of the values in the parentheses for thesecorrelations is calculated.

c_i,j(t) is an abnormality continuity level indicating a degree ofcontinuity of the abnormality, that is, a ratio of a length of timeperiod during which the abnormality of the correlation is detectedwithin a predetermined length of time period until time t.

step(y) is a step function, and is equal to 0 when a ratio of theresidual error to the threshold value calculated by formula 3 is lessthan 1, that is, the correlation is in normal. Therefore, when all thecorrelations between metric x_i and the other metrics with which metricx_i has a correlation are in normal, the abnormality score S_i(t) isequal to 0.

FIG. 8 is a diagram showing a calculation process of the abnormalityscore in the first exemplary embodiment of the present invention. FIG. 9is a diagram showing a calculation result of the abnormality score inthe first exemplary embodiment of the present invention.

For example, the abnormality score calculation unit 104 calculates theabnormality continuity level shown in FIG. 8, at each time, on the basisof the correlation change information 123 shown in FIG. 7, andcalculates the abnormality score shown in FIG. 9.

In the calculation result of the abnormality score in FIG. 9, it islikely that, at 12:30, an abnormality level of metric x_1 is large orthe abnormality of metric x_1 continues to exist since an abnormalityscore of metric x_1 is greater than abnormality scores of metric x_2 andmetric x_3, for example.

Next, the abnormality score calculation unit 104 generates an analysisresult 130 including an abnormality score of each metric and outputs thegenerated analysis result 130 to the monitoring terminal 300, at eachtime (step S105). The display 301 of the monitoring terminal 300displays the analysis result 130 to a manager or the like.

FIG. 10 is a diagram showing an example of the analysis result 130 inthe first exemplary embodiment of the present invention. In the examplein FIG. 10, the analysis result 130 includes an abnormality correlationratio display unit 131, an abnormality correlation display unit 132, andan abnormality score display unit 133.

For example, the abnormality score calculation unit 104 transmits theanalysis result 130 shown in FIG. 10 to the monitoring terminal 300.

The abnormality correlation ratio display unit 131 displays a ratio ofthe number of correlations determined as being in abnormal to the numberof correlations included in the correlation model 122 with time. Themanager or the like can grasp the time at which abnormalities of a largenumber of correlations have occurred in the monitored apparatus 200, byreferring to the abnormality correlation ratio display unit 131.

The abnormality correlation display unit 132 displays a correlationdetermined as being in abnormal in the correlation model 122. In theabnormality correlation display unit 132, each metric in the correlationmodel 122 is represented by a circle with an identifier (name) of themetric and the correlation determined as being in abnormal isrepresented by a solid line connecting a circle with another circle. Theabnormality correlation display unit 132 displays the correlationdetermined as being in abnormal, with respect to the time designated bythe manager or the like on the abnormality correlation ratio displayunit 131, for example. The abnormality correlation display unit 132 maydisplay the correlation determined as being in abnormal, with respect tothe latest collection time, every time new performance information iscollected. A manager can grasp a metric to which abnormalities areconcentrated in the monitored apparatus 200 by referring to theabnormality correlation display unit 132.

The abnormality score display unit 133 displays an abnormality score ofeach metric. Each metric is represented as a circle with an identifierof the metric in a predetermined rectangular area, and as theabnormality score becomes larger, a size (radius) of the circle becomeslarger. In addition, as the abnormality score becomes larger, the circleis displayed at an upper part of the rectangular area.

The abnormality score calculation unit 104 determines a size of thecircle, which becomes larger depending on the abnormality score, and aheight from a base of the rectangular area on the perpendicular axis tothe base, which becomes higher depending on the abnormality score. Then,the abnormality score calculation unit 104 generates data to display theabnormality score display unit 133 showing the circle with thedetermined size and height.

The abnormality score display unit 133 may display the abnormalityscore, with respect to the time designated by the manager or the like onthe abnormality correlation ratio display unit 131, for example. Theabnormality score display unit 133 may display the abnormality scorewith respect to the latest collection time, every time new informationis collected.

As described in Equation 4, the abnormality score is calculated bymultiplying an abnormality level (residual error) of a correlation by anabnormality continuity level. Since the abnormality continuity level isa ratio of a length of time period during which the abnormality isdetected within a predetermined length of time period until the time atwhich the abnormality score is calculated, the abnormality scoregradually increases as time passes if the abnormality continues to existand the abnormality score gradually decreases as time passes if theabnormality has disappeared. In the abnormality score display unit 133,if the abnormality of a metric continues to exist, a circle of themetric moves upward in the rectangular area, while gradually enlarging,and if the abnormality has disappeared, the circle of the metric movesdownward in the rectangular area, while gradually shrinking. That is, inthe abnormality score display unit 133, the abnormality score of eachmetric is displayed as a movement which is similar to a movement of anobject having a buoyant force, just like a balloon or a bubble. Theupward movement and the downward movement of the circle in therectangular area may follow the principle of Archimedes.

In the analysis result 130 in FIG. 10, at 12:30, the circle representingmetric x_1 is larger than circles of the other metrics, and displayed atthe upper part. Thereby, a manager can easily grasp that it is likelythat the abnormality level of metric x_1 is large or the abnormality ofmetric x_1 continues to exist.

In the calculation result of the abnormality score in FIG. 9, theabnormality of metric x_1 continues to exist from 12:20 to 12:40, andthe abnormality score reaches its peak at 12:30. In this case, in theabnormality score display unit 133, the circle representing the metricx_1 moves upward while enlarging from 12:20 to 12:30, and moves downwardwhile shrinking from 12:30 to 12:40. Thereby, the manager or the likecan easily grasp start or stop of the continuous abnormality of metricx_1.

As above described, the operation of the first exemplary embodiment ofthe present invention is completed.

Next, a characteristic configuration of the first exemplary embodimentof the present invention will be described. FIG. 1 is a block diagramshowing a characteristic configuration according to the first exemplaryembodiment of the present invention.

Referring to FIG. 1, an operation management apparatus 100 includes ametric collection unit 101 and an abnormality score calculation unit104.

The metric collection unit 101 collects a measured value of each of aplurality of metrics in a system sequentially. The abnormality scorecalculation unit 104 calculates and outputs, on the basis of acontinuity level indicating a degree of continuity of an abnormality ofthe measurement value for each of the plurality of metrics at each time,an abnormality score for the metric.

Next, an advantageous effect of the first exemplary embodiment of thepresent invention will be described.

In the technology described in patent literature 1, the abnormalityscore of metric x_i at a given time is calculated on the basis of thenumber of correlations which is determined as being in abnormal at thetime in the correlations from metric x_1 to the other metrics, and alist of the abnormal metrics is displayed with abnormality scores. Amanager or the like preferentially handles the abnormality of the metricwith high abnormality score on the basis of the list of the abnormalmetrics. In this case, since the manager or the like cannot graspwhether the abnormality of the metric is continuous or temporary, themanager or the like may preferentially handles the abnormality of themetric even if the abnormality of the metric is temporary. In order tograsp whether the abnormality of the metric is continuous or temporary,the manager or the like has to compare, for example, the list of theabnormal metrics at the time with lists thereof before and after thetime.

According to the first exemplary embodiment of the present invention, itis possible to grasp the continuous abnormality of the metric in thesystem, easily. The reason is that the abnormality score calculationunit 104 calculates the abnormality score of the metric on the basis ofa continuity level indicating a degree of continuity of the abnormalityof the metric at each time. Further, the reason is that the abnormalityscore calculation unit 104 displays the abnormality score of each metricby using a figure with a size and a display position depending on theabnormality score.

Thereby, the manager or the like can preferentially handle thecontinuous abnormality of the metric, and a stable operation of thesystem is expected compared with the case in which only the technologyof patent literature 1 is used.

Thereby, in order to investigate whether or not the abnormality of themetric is continuous, the manager or the like does not need to comparethe lists of the abnormal metrics at each time, and reduction ofmanager's burden for grasping the continuous abnormality and preventionof overlooking are expected.

According to the first exemplary embodiment of the present invention,start or stop of continuous abnormality of the metric can be graspedeasily. The reason is that the abnormality score calculation unit 104displays the abnormality score of each metric by using a figure with asize and a display position depending on the abnormality score.

Second Exemplary Embodiment

Next, a second exemplary embodiment of the present invention will bedescribed.

The second exemplary embodiment of the present invention differs fromthe first exemplary embodiment in that the abnormality score calculationunit 104 groups metrics having the same start time of abnormalitydetection.

A configuration of the second exemplary embodiment of the presentinvention is the same as that of the first exemplary embodiment of thepresent invention.

Next, an operation of the operation management apparatus 100 in thesecond exemplary embodiment of the present invention will be explained.

FIG. 11 is a flowchart showing processing of the operation managementapparatus 100 in the second exemplary embodiment of the presentinvention. Operations from collection of performance information by themetric collection unit 101 to calculation of an abnormality score ofeach metric by the abnormality score calculation unit 104 (steps S201 toS204) are similar to the operations of the first exemplary embodiment ofthe present invention (steps S101 to S104).

The abnormality score calculation unit 104 refers to the calculatedabnormality score of each metric calculated in the above mentioned stepS104, and groups the metrics having the same abnormality detection starttime (step S205). The metrics having the same the abnormality detectionstart time are metrics whose abnormality score changes from 0 to a valuegreater than 0 at the same time. The abnormality score calculation unit104 calculates a group abnormality score which is an abnormality scorefor each group by totalizing abnormality scores of metrics which aregrouped (step S206).

FIG. 12 is a diagram showing a calculation result of the groupabnormality score in the second exemplary embodiment of the presentinvention.

For example, the abnormality score calculation unit 104 calculates thegroup abnormality score shown in FIG. 12 on the basis of the abnormalityscore of FIG. 9, at each time.

In this case, the abnormality score calculation unit 104 generates, onthe basis of an abnormality score at 12:20, a group A composed of onlymetric x_1 whose abnormality detection starts at the time. The group Acontinues until 12:40. The abnormality score calculation unit 104generates, on the basis of the abnormality score at 12:30, a group Bcomposed of metric x_2 and metric x_3 whose abnormality detection startsat the time. The group B ends at 12:30. The abnormality scorecalculation unit 104 calculates, on the basis of the abnormality scoreof metrics included in each group, the group abnormality score shown inFIG. 12.

The calculation result of the group abnormality score in FIG. 12 showsthat the abnormality detection start time of metric x_2 is same as theabnormality detection start time of metric x_3. It is likely that theabnormality of metric x_2 and the abnormality of metric x_3 are causedby a common abnormal event, and they are highly related each other. Thecalculation also shows that the abnormality detection start time ofmetric x_1 is different from the abnormality detection start time ofmetrics x_2 and x_3.

The abnormality score calculation unit 104 generates an analysis result140 including a group abnormality score of each group at each time andoutputs the generated analysis result 140 to the monitoring terminal 300(step S207).

FIG. 13 is a diagram showing an example of the analysis result 140 inthe second exemplary embodiment of the present invention. In the exampleof FIG. 13, the analysis result 140 includes an abnormality correlationrate display unit 141, an abnormality correlation display unit 142, andan abnormality score display unit 143.

The abnormality correlation rate display unit 141 and the abnormalitycorrelation display unit 142 displays a ratio of correlations determinedas being in abnormal and a correlation determined as being in abnormalin the correlation model 122, respectively, as well as the abnormalitycorrelation rate display unit 131 and the abnormality correlationdisplay unit 132 of the first exemplary embodiment of the presentinvention.

The abnormality score display unit 143 displays the group abnormalityscore of each group. Each group is displayed in a given rectangulararea, as a circle, with an identifier of a metric included in the group,and as the abnormality score becomes larger, the circle becomes largerand is displayed at the upper part of the rectangular area, as well asthe first exemplary embodiment of the present invention.

In the analysis result 130 shown in FIG. 10 in the first exemplaryembodiment of the present invention, since the abnormalities of metricsx_2 and x_3 do not have a continuity and the abnormality scores thereofare lower than that of the metric x_1 at 12:30, the circles representingmetrics x_2 and x_3 are displayed at the lower part of the abnormalityscore display unit 133 with small size.

On the other hand, in the analysis result 140 shown in FIG. 13 in thesecond exemplary embodiment of the present invention, a circlerepresenting a group composed of metrics x_2 and x_3 is displayed at theupper part of the abnormality score display unit 143 at 12:30, withlarger size than circles of the other groups, as well as a circlerepresenting a group composed of metric x_1. Thereby, the manager or thelike can easily grasp that metric x_2 and metric x_3 are highly relatedto each other, the sum of the abnormalities thereof is large, or theabnormalities continue to exist.

As above described, the operation of the second exemplary embodiment ofthe present invention is completed.

Next, an advantageous effect of the second exemplary embodiment of thepresent invention will be described.

When a plurality of metrics have abnormal values due to one abnormalevent, abnormalities of the plurality of metrics may occur at the sametime. In the technology of patent literature 1, in order to grasp suchabnormal event, a manager or the like has to confirm a temporal relationbetween the abnormal metrics, on the basis of a temporal change of anabnormal metrics list, for example, and extract abnormal metricsoccurring at the same time. As another method to identify such abnormalevent, a method applying signature matching with respect to acorrelation on which an abnormality occurs is also proposed. However,the signature matching cannot be applied to an unknown abnormal event,since there is no accumulation of the signature regarding correlationsfor the unknown abnormal event.

According to the second exemplary embodiment of the present invention,abnormal metrics occurring at the same time can be easily grasped. Thereason is that the abnormality score calculation unit 104 groups themetrics having the same abnormality detection start time, and calculatesthe group abnormality score of each group.

Thereby, a manager or the like can quickly grasp the abnormality due tothe common abnormal event, and expect a stable operation of the system.

While the invention has been particularly shown and described withreference to exemplary embodiments thereof, the invention is not limitedto these embodiments. It will be understood by those of ordinary skillin the art that various changes in form and details may be made thereinwithout departing from the spirit and scope of the present invention asdefined by the claims.

For example, in the exemplary embodiments of the present invention, aperformance index in the system is defined as a metric, and theabnormality score of the metric is calculated. However, any index whichis represented as time series, such as requirements from a clientcomputer in the system, the number of cargoes processed at a unit timethrough the system, can be used as a metric.

In the exemplary embodiments of the present invention, as a degree ofthe abnormality of the metric, a degree of the abnormality of thecorrelation (correlation destruction) regarding the metric is used.However, a degree of another abnormality level, such as a degree ofexcess from a predetermined threshold for a value of the metric(threshold abnormality) can also be used.

In the exemplary embodiments of the present invention, the thresholdvalue in the correlation model 122 is calculated with respect to eachcombination of metrics based on the maximum value of the residual errorof the correlation function. However, the threshold value may be apredetermined value defined for each combination of metrics, or apredetermined value defined for the correlation model 122.

In the exemplary embodiments of the present invention, the abnormalityscore is calculated by using Equation 4. However, the abnormality scoremay be calculated by using other Equations as long as the abnormalityscore increases depending on a continuity degree of the abnormality ofthe metric. For example, in Equation 4, the abnormality score may becalculated by using only abnormality continuity level of the correlationon the metric, without using the abnormality level of the correlation(residual error) regarding the metric.

In the exemplary embodiments of the present invention, in the analysisresults 130, 140, the abnormality score is indicated by a size of thecircle representing the metric and the height at which the circle isdisplayed in the predetermined rectangular area. However, a figure witha different shape or a different display position may be used as long asthe abnormality score is indicated. For example, the abnormality scoremay be indicated by a figure other than a circle, such as an ellipse ora sphere. The abnormality score may be indicated by a height from apredetermined reference position on a perpendicular axis which isdefined on a predetermined shape other than a rectangle, such as acircle or a trapezoid.

The abnormality score may be indicated by a height from a predeterminedreference position on a perpendicular axis to a horizon plane. In thiscase, even though the display unit 301 is arranged to be inclined to thehorizon plane, a figure representing each metric moves upward ordownward in a perpendicular direction relative to the horizon planedepending on the abnormal score. That is, a movement of the figurerepresenting each metric is similar to a movement of an object having areal buoyant force. Thereby, a manager can easily grasp change in theabnormality score of each metric.

This application is based upon and claims the benefit of priority fromJapanese Patent Application No. 2011-182261, filed on Aug. 24, 2011, thedisclosure of which is incorporated herein in its entirety by reference.

REFERENCE SIGNS LIST

-   -   100 operation management apparatus    -   101 metric collection unit    -   102 correlation model generation unit    -   103 correlation change analysis unit    -   104 abnormality score calculation unit    -   111 metric storage unit    -   112 correlation model storage unit    -   113 correlation change storage unit    -   121 sequential performance information    -   122 correlation model    -   123 correlation change information    -   130 analysis result    -   131 abnormality correlation ratio display unit    -   132 abnormality correlation display unit    -   133 abnormality score display unit    -   140 analysis result    -   141 abnormality correlation ratio display unit    -   142 abnormality correlation display unit    -   143 abnormality score display unit    -   200 monitored apparatus    -   201 monitoring agent    -   300 monitoring terminal

1-21. (canceled)
 22. An operation management apparatus comprising: aprocessor; and a memory configured to store a program logic, the programlogic comprising: an abnormality detection unit configured to detect,from among a plurality of metrics, one or more abnormal metrics; aclustering unit configured to generate one or more clusters, theclusters each including two or more of the abnormal metrics; and adisplay image generation unit configured to generate a display imageincluding one or more first graphical symbols and one or more secondgraphical symbols, the first graphical symbols each corresponding toeach of the clusters and the second graphical symbols each correspondingto one of the abnormal metrics not included in the clusters, wherein thefirst graphical symbol is displayed, in the display image, at a sizedetermined based on abnormal metrics included in the clustercorresponding to the first graphical symbol.
 23. The operationmanagement apparatus according to claim 22, wherein the size of thefirst graphical symbol is determined larger than the size of the secondgraphical symbol.
 24. The operation management apparatus according toclaim 22, wherein the first graphical symbol is displayed at a firstheight determined by a first score of the corresponding cluster, thefirst score being determined based on an attribute value of thecorresponding cluster.
 25. The operation management apparatus accordingto claim 24, wherein the first graphical symbol is displayed at a higherposition when the first score is larger.
 26. The operation managementapparatus according to claim 24, wherein the first score is determinedbased on the abnormal metrics included in the corresponding cluster. 27.The operation management apparatus according to claim 24, wherein thesecond graphical symbol is displayed at a second height determined by asecond score of the corresponding abnormal metric, the second scorebeing determined based on an attribute value of the correspondingabnormal metric.
 28. The operation management apparatus according toclaim 22, wherein the metrics are acquired from a plurality of datasources.
 29. An operation management method comprising: detecting, fromamong a plurality of metrics, one or more abnormal metrics; generatingone or more clusters, the clusters each including two or more of theabnormal metrics; and generating a display image including one or morefirst graphical symbols and one or more second graphical symbols, thefirst graphical symbols each corresponding to each of the clusters andthe second graphical symbols each corresponding to one of the abnormalmetrics not included in the clusters, wherein the first graphical symbolis displayed, in the display image, at a size determined based onabnormal metrics included in the cluster corresponding to the firstgraphical symbol.
 30. The operation management method according to claim29, wherein the size of the first graphical symbol is determined largerthan the size of the second graphical symbol.
 31. The operationmanagement method according to claim 29, wherein the first graphicalsymbol is displayed at a first height determined by a first score of thecorresponding cluster, the first score being determined based on anattribute value of the corresponding cluster.
 32. The operationmanagement method according to claim 31, wherein the first graphicalsymbol is displayed at a higher position when the first score is larger.33. The operation management method according to claim 31, wherein thefirst score is determined based on the abnormal metrics included in thecorresponding cluster.
 34. The operation management method according toclaim 31, wherein the second graphical symbol is displayed at a secondheight determined by a second score of the corresponding abnormalmetric, the second score being determined based on an attribute value ofthe corresponding abnormal metric.
 35. The operation management methodaccording to claim 29, wherein the metrics are acquired from a pluralityof data sources.
 36. A non-transitory computer readable storage mediumrecording thereon a program, causing a computer to perform a methodcomprising: detecting, from among a plurality of metrics, one or moreabnormal metrics; generating one or more clusters, the clusters eachincluding two or more of the abnormal metrics; and generating a displayimage including one or more first graphical symbols and one or moresecond graphical symbols, the first graphical symbols each correspondingto each of the clusters and the second graphical symbols eachcorresponding to one of the abnormal metrics not included in theclusters, wherein the first graphical symbol is displayed, in thedisplay image, at a size determined based on abnormal metrics includedin the cluster corresponding to the first graphical symbol.
 37. Thenon-transitory computer readable storage medium recording thereon theprogram according to claim 36, wherein the size of the first graphicalsymbol is determined larger than the size of the second graphicalsymbol.
 38. The non-transitory computer readable storage mediumrecording thereon the program according to claim 36, wherein the firstgraphical symbol is displayed at a first height determined by a firstscore of the corresponding cluster, the first score being determinedbased on an attribute value of the corresponding cluster.
 39. Thenon-transitory computer readable storage medium recording thereon theprogram according to claim 38, wherein the first graphical symbol isdisplayed at a higher position when the first score is larger.
 40. Thenon-transitory computer readable storage medium recording thereon theprogram according to claim 38, wherein the first score is determinedbased on the abnormal metrics included in the corresponding cluster. 41.The non-transitory computer readable storage medium recording thereonthe program according to claim 38, wherein the second graphical symbolis displayed at a second height determined by a second score of thecorresponding abnormal metric, the second score being determined basedon an attribute value of the corresponding abnormal metric.
 42. Thenon-transitory computer readable storage medium recording thereon theprogram according to claim 36, wherein the metrics are acquired from aplurality of data sources.